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Modifying Transients for 
Efficient Coding of Audio 

Renat Vafin, Richard Heusdens, W. Bastiaan Kleijn 



Summary 

We propose an algorithm for efficient representation of transients in audio 
signals. We estimate the transient part in an original audio signal and modify 
the locations of the transients in such a way that the transients can occur 
only at locations specified by a time grid. This procedure allows an efficient 
representation of transients with damped sinusoids. We also verify that the 
introduced modifications do not result in a perceptual difference between the 
original and modified audio signal. 

1 Introduction 

Parametric coding of audio is a popular tool for representing audio signals 
at very low bit rates [1, 2, 3, 4]. In a parametric audio coder a signal is 
represented with a source model, and parameters of the model are estimated 
and encoded. Psychoacoustic rules are used to control the estimation and 
encoding process in order to exploit irrelevancy present in the signal. 

Different source models can be appropriate for different audio signals. 
The ultimate goal is to obtain an efficient representation of an audio signal 
while meeting perceptual quality requirements. A popular representation 
of audio signals is based on decomposition of an original signal into three 
parte: a transient part, a sinusoidal part and a noise part. Each part is 
then represented by a corresponding set of parameters [1, 3, 4]. Having a 
dedicated model for the transient part proved to be beneficial for portions of 
audio signals with sharp attacks, because sinusoidal and noise model cannot 
represent such events efficiently [5]. 
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We present a method that improves the efficiency of representing the 
transient part of an audio signal. It is shown in [6] that transients can be ef- 
ficiently modeled using exponentially-modulated sinusoids (below we refer to 
exponentially-modulated sinusoids as damped sinusoids, assuming decaying 
amplitudes). In that work, an audio signal is analyzed on a segment-by- 
segment basis, and each segment is. represented as a sum of damped sinu- 
soids. A problem occurs when a transient occurs in the middle of a segment. 
Compared to the case when a transient occurs in the beginning of a seg- 
ment the number of damped sinusoids needed to model the transient well 
increases considerably. A possible solution to this problem is to allow a vari- 
able segmentation of the signal, such that transients will always occur in the 
beginning of the segments. However, the search for the optimal segmentation 
introduces an additional computational cost. Furthermore, additional side 
information is required to describe the segmentation. 

In our new approach we use a fixed segmentation and modify the tran- 
sient part of the audio signal instead. The modified transient signal can be 
efficiently represented by damped sinusoids. Specifically, we proceed in the 
following steps: 

1. Estimate the transient part in the original audio signal and subtract 
it from the original audio signal to form a residual signal. (Next, the 
residual signal can be analyzed by sinusoidal and noise models). 

2* Modify the locations of the estimated transients in such a way that the 
transients can only occur at locations specified by a time grid. 

3. Model the modified transient signal with damped sinusoids. 

We also verify that when the modified transient signal is added to the resid- 
ual signal obtained in step 1, there is no perceptual difference between the 
resulting signal and the original audio signal. Thus, we can claim that the 
introduced difference in transients locations is not of perceptual importance, 
i.e. irrelevant. Thus, no additional side information cost is introduced by 
the modification. 

In order to modify transients locations we have first to estimate the tran- 
sient part in the original audio signal. Different transient models have been 
used in parametric coding of audio [1, 2, 7]. For our purpose, a transient 
model should provide a good approximation of the transient part, and have 
a lower complexity compared to the damped sinusoids model with variable 
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Figure 1: Positions of sinusoids in the frequency-domain segment. 



segmentation mentioned above. In our current work, we use the elegant tran- 
sient model based on duality between time and frequency domain, presented 
in [7]. Our tr ansie nt modific ation approach based on this model is described 
in the next section. 

2 Modification of transients locations 

2.1 Transient estimation and modification 

The tr ansient estimation model presented in [7] is based on the duality be- 
tween the time ?t i^ frequency domain. A delta-impulse in the time domain 
corresponds to a sinusoid in the frequency domain. Furthermore, a sharp 
transient in the time domain corresponds to a frequency-domain signal which 
can be represented efficiently by a sum of sinusoids. Specifically, the tran- 
sients are estimated in the following steps: 

1. The discrete cosine transform (DCT) is used to transform a signal to 
the frequency domain. The DCT size should be sufficiently large to 
en sure that a transient is a short event in the time-domain segment 
(thus, its transform to the frequency domain can be modeled efficiently 
by sinusoids). A block size of about 1 s is sufficient. 

2. The frequency-domain (DCT-domain) signal is analyzed with a sinu- 
soidal model. An iterative matching pursuit algorithm is used for the 
analysis. la our current work we use a consistent iterative sinusoidal 
analysis-synthesis with Hanning- windowed sinusoids as presented in [8]. 
Positions of sinusoids in the frequency-domain segment are illustrated 
in figure 1. 

As a result of the above procedure a transient is represented by sinu- 
soidal parameters (amplitudes, frequencieSj phases) obtained in step 2. The 
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information about the location of a transient in a time-domain segment is 
contained in the frequency parameters of the sinusoids corresponding to the 
transient. A transient in the beginning of a segment results in low sinusoidal 
frequencies, while a transient in the end of a segment results in high sinusoidal 
frequencies. The frequency resolution of the sinusoidal model depends on the 
required resolution in estimation transients locations. If the required time 
resolution is one sample then the required frequency resolution is defined by 
the reciprocal of the DCT size. 

Due to the duality between the transient location in a time-domain seg- 
ment and the frequencies of the corresponding sinusoids, the obvious way 
to modify the transient location is to modify the corresponding frequencies 
(plus a correction in the phase parameters). Let us denote by no the tran- 
sient location in the tune-domain segment and {A^, w<j, foj} the sinusoidal 
parameters corresponding to the transient. Here the index $ is the sinusoid 
index in a sinusoidal segment and the index j is the sinusoidal segment index 
in a DCT-domain segment (both indices have the first value 1). Then in 
order to modify the transient location by An (a positive An corresponds to 
a shift backward in time and a negative An to a shift forward in time), the 
frequencies and phases 4ij should be modified as follows: 

Amr 

where N is the DCT size, L is the length of the sinusoidal segments used in 
the analysis of the DCT-domain segment and L/2 is the shift between the 
sinusoidal segments. In the sinusoidal segments the phase 0 is defined to be 
in the middle of the segments. No modification of amplitudes Aij is needed. 

Note that the above procedure is different from independent quantization 
of sinusoidal parameters corresponding to the transient. All frequencies are 
modified by the same amount. This together with the phase correction of 
equation (2) ensures that the shape of the time-domain transient is preserved, 
only the location is modified. The shift An is defined by the difference 
between the original location no and the closest allowed location from a time 
grid. 
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2.2 Details of the modification procedure 

m Because the DCT size is large (1 s), more than one transient can occur 
in a time-domain segment. In this case, the model has to identify 
sinusoidal parameters corresponding to different transients. This is 
done by declaring dose sinusoidal frequencies Ujj to represent the same 
transient. Specifically, two sinusoids having frequencies differing by 
not more than e w are declared to represent the same transient and 
two sinusoids having frequencies differing by more than e u are declared 
to represent different transients. Then locations of all transients are 
modified separately. 

• A transient can occur at the very beginning or at the very end of a 
time-domain segment. Then the modification of sinusoidal frequencies 
can yield frequencies below 0 or above 7i\ This results in a distortion of 
the shape of the time-domain transient. Simply cancelling such sinu- 
soids will not preserve the transient shape, because some information 
describing the transient is not used. 

Instead^ we use a solution that preserves the transients shapes. We 
allow an overlap between time-domain segments (about 0.1 ins). In 
this case a transient can appear in two overlapping segments, i.e. in 
the region of mutual overlap. Because the overlap is sufficiently large, if 
the transient is located very dose to a border of one of the overlapping 
segments, then it is located on a safe distance from a border of the other 
segment. It is straightforward to identify the transient location from 
sinusoidal frequencies, and therefore it is easy knowing the estimated 
sinusoidal frequencies in the two overlapping segments to identify when 
a transient is represented in two segments. If such a situation occurs 
we cancel the corresponding sinusoids in one of the two segments, in 
the one where the transient is closer to a border. 

• A typical transient lasts for more than one time sample. A natural 
question is then what is the location «o of the transient. In our current 
approach) we identify the set of frequencies uhj representing the tran- 
sient and then define the most common frequency value cjo in the set. 
The location of the transient is then found from ujq using the duality 
between the time dAm*™ and frequency domain. Currently, we are 
working on improving the estimation of no. 
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Figure 2: Castanet signal. 



2.3 Results 

Our preliminary experiments were performed on the Castanet signal (fig- 
ure 2). This signal contains short transients and they are well described by 
the transient estimation technique of [7]. The transient signal was estimated, 
and the residual signal was obtained as the difference between the original 
audio signal and the transient signal The transient signal was modified such 
that the transients were allowed to occur at locations corresponding to a 
time grid of 5 ms or 10 ms. In both cases, the modifications did not result 
in the perceptual difference between the original and modified transient sig- 
nals. However, when a modified transient signal was added to the residual 
signal, a slight perceptual difference with the original signal was noticed. We 
could hear a distortion before transients. We believe that the problem corns 
from the sinusoidal analysis of the DCT-domain signals, and that the incor- 
poration of the tonality measure defined in [9] and used in [7] will solve the 
problem. The standard [9] is requested from Philips. 
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3 Ongoing work 

m Defining no as described in subsection 2.2. 

• Modeling the modified transient signal with damped sinusoids and 
comparing efficiency of the transient representation with the reference 
method of [7], The modeling with damped sinusoids can be performed, 
e.g. as described in [6]. 

4 Application areas 

Audio coding. 
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Fig. 3 



Fig. 3 shows an audio and/or speech encoder according to an embodiment of 
the invention. This encoder estimates a transient in the signal, modifies a location of 
the estimated transient to obtain a modified transient having a location closer to a grid 
location than the estimated transient location, the grid location being specified by a 
predefined time grid, and includes a representation of the modified transient in the 
encoded signal. 
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1. A method of encoding a signal to obtain an encoded signal, the method 
comprising the steps o£ 

estimating a transient in the signal, 

'modifying a location of the estimated transient to obtain a modified transient 
having a location closer to a grid location than the estimated transient location, the 
grid location being specified, by a predefined time grid, and 

including a representation of the modified transient in the encoded signal. 

2. An encoder for encoding a signal to obtain an encoded signal, the encoder 
comprising: 

means for estimating a transient in tbie signal, 

means for modifying a location of the estimated transient to obtain a modified 
transient having a location closer to a grid location than the estimated transient 
location, the grid location being specified by a predefined time grid, and 

means for including a representation of the-modified transient in the encoded 




3. An encoded signal comprising representations of modified transients having a 
location approximating a grid location, the grid location being specified by a 
predefined time grid. 

4. A storage medium on which a signal as claimed in claim 3 has been stored. 
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